HDDS-15601. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager by umesh9794 · Pull Request #10549 · apache/ozone

umesh9794 · 2026-06-19T10:49:06Z

What changes were proposed in this pull request?

Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15601

How was this patch tested?

Added unit test to cover the code changes

…ontainerHealthSchemaManager ContainerHealthSchemaManager.deleteScmStatesForContainers (used by the Recon container health scan via replaceUnhealthyContainerRecordsAtomically) previously issued one chunked DELETE ... WHERE container_id IN (...) per 1,000 IDs, regardless of how the IDs were distributed. For large contiguous ID ranges this generated many statements where a single range delete would suffice. Sort and de-duplicate the IDs, then split them into maximal contiguous runs: - runs of at least MIN_RANGE_DELETE_RUN (one IN-chunk worth) are removed with a single constant-size DELETE ... WHERE container_id BETWEEN lo AND hi - shorter runs and scattered IDs continue to use bytecode-safe chunked IN deletes This keeps the common production case (dense ID ranges with gaps from deleted containers) to a handful of statements while never doing worse than plain IN chunking for sparse IDs. The 7-state SCM filter is applied on both paths, so non-SCM states such as ALL_REPLICAS_BAD are preserved as before. Add TestContainerHealthSchemaManager covering contiguous, scattered, mixed, and unsorted/duplicate ID inputs, and asserting non-SCM states inside a deleted range are retained.

umesh9794 · 2026-06-19T10:52:49Z

@devmadhuu @adoroszlai please take a look at this PR. Its part of changes suggested in #10532 10532

adoroszlai · 2026-06-19T11:08:23Z

Thanks @umesh9794 for the patch. Next time please keep the original PR instead of opening a new one. Now discussion will be split between two PRs.

adoroszlai · 2026-06-19T11:49:20Z

+        totalDeleted += deleteScmStatesByRange(dslContext,
+            sortedIds.get(runStart), sortedIds.get(i));


What about the suggestion to include multiple BETWEEN ranges in the same delete statement?

I was trying to keep the DELETE statements with BETWEEN and IN clause separate to issue 2 DELETE statements. So in the worst case it falls back to the same IN chunking the code used before this change, so it's never worse than the old baseline of ceil(n / 1000). In best case it will issue just 1 statement for a fully contiguous block.

As the comments says above combining both might overflow Derby's per-statement bytecode limit (Bytecode per method).

If you prefer, I can combine it both into one. Thanks!

Using BETWEEN instead of IN for contiguous range of IDs is a good idea.

But I don't think the current change has much effect on production behavior, a range of 1000+ "unhealthy" containers in Recon is not real life scenario. So it adds complexity without benefit.

We should apply this logic for shorter ranges, without executing too many small statements. For that, we need to find the threshold (number of container IDs) over which BETWEEN n AND n+k OR yields shorter code than the corresponding IN clause. Any number of IDs beyond that is a win.

Sure, let me apply that change here, thanks.

Using BETWEEN instead of IN for contiguous range of IDs is a good idea.

But I don't think the current change has much effect on production behavior, a range of 1000+ "unhealthy" containers in Recon is not real life scenario. So it adds complexity without benefit.

We should apply this logic for shorter ranges, without executing too many small statements. For that, we need to find the threshold (number of container IDs) over which BETWEEN n AND n+k OR yields shorter code than the corresponding IN clause. Any number of IDs beyond that is a win.

@adoroszlai I have applied the changes, please take a look. Thanks!

umesh9794 · 2026-06-22T04:20:58Z

Thanks @umesh9794 for the patch. Next time please keep the original PR instead of opening a new one. Now discussion will be split between two PRs.

I agree, actually that was my mistake. I reverted the commit and pushed in the same branch which accidentally closed the PR which was unintentional.

…rHealthSchemaManager Coalesce runs of 3+ IDs into combined BETWEEN/IN predicates bounded by an operand budget, and extend TestContainerHealthSchemaManager to cover combined and multi-statement deletes.

devmadhuu

Thanks @umesh9794 for the patch. Functionally by seeing code, looks fine, but we need to consider few real non-functional issues which can break things in production cluster:

Earlier old code was safely putting 1000 Ids under IN clause, so it was tested and known that it will not bring Derby's 64KB limit issue as all 1000 Ids were used to stay same in count and predicted with size of compiled Derby byte code. I mean , it was assuring that keep each SQL command small enough that Derby's secret program stays under 64 KB. Question is now is this new logic brings that safety when at run time it will determine the number of container ids with mix n match of IN and BETWEEN ? I have a doubt about that and in that case Derby may crash.
The new derby statement formed in a more complicated way: Lets take an analogy: "find everyone in the range 1–100, OR everyone in range 150–2000, OR these specific scattered container ids." When a database sees a question full of "OR ... OR ... OR," it sometimes gives up on using the index and instead just reads the entire table from row to row to be safe. That's called a full table scan, and it's much slower.

On top of being slower, this delete runs inside a transaction where the database locks rows while it works. If it scans the whole table instead of jumping to specific rows, it may lock far more than it needs to, which can slow down or block other things happening at the same time.

So the change that was meant to make things faster could, depending on how Derby decides to run it, actually make the delete slower and heavier. There's an existing performance test suite that would reveal this, but the PR doesn't mention running it — it only mentions the new correctness test.
3. The whole optimization only helps when IDs are next to each other (like 100, 101, 102, 103...). That's when you can collapse them into a tidy range.

But in real life, unhealthy container IDs are usually scattered (like 14, 88, 230, 911...). The PR's own description even admits it's aiming at "fragmented" data. Here's the irony: scattered data has no runs to collapse. So the clever new logic finds nothing to optimize and falls back to behaving like the old code anyway. So in the common case, you've added cost and effort for zero benefit. If you have real production kind of test which can prove with figures, please publish those stats.

The old version of this logic was about 15 lines and dead simple: "chop the list into groups of 1,000 and delete each group." Boring, but easy to read, easy to trust, and it already fully solved the only real requirement (stay under Derby's limit).

The new version is about 100 lines with a genuinely clever algorithm (detecting runs, mixing ranges and lists, tracking an operand budget, flushing batches). Clever code is not free — it's harder to read, harder to test, and easier to break later when someone else touches it.

adoroszlai changed the title ~~HDDS-15582. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager~~ HDDS-15601. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager Jun 19, 2026

adoroszlai reviewed Jun 19, 2026

View reviewed changes

adoroszlai requested a review from devmadhuu June 22, 2026 05:51

Apply range deletes to short contiguous container ID runs in Containe…

a386fb0

…rHealthSchemaManager Coalesce runs of 3+ IDs into combined BETWEEN/IN predicates bounded by an operand budget, and extend TestContainerHealthSchemaManager to cover combined and multi-statement deletes.

devmadhuu reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-15601. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager#10549

HDDS-15601. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager#10549
umesh9794 wants to merge 2 commits into
apache:masterfrom
umesh9794:HDDS-15582

umesh9794 commented Jun 19, 2026

Uh oh!

umesh9794 commented Jun 19, 2026

Uh oh!

adoroszlai commented Jun 19, 2026

Uh oh!

adoroszlai Jun 19, 2026

Uh oh!

umesh9794 Jun 22, 2026

Uh oh!

adoroszlai Jun 22, 2026

Uh oh!

umesh9794 Jun 22, 2026

Uh oh!

umesh9794 Jun 22, 2026

Uh oh!

umesh9794 commented Jun 22, 2026

Uh oh!

devmadhuu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		totalDeleted += deleteScmStatesByRange(dslContext,
		sortedIds.get(runStart), sortedIds.get(i));

Uh oh!

Conversation

umesh9794 commented Jun 19, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

umesh9794 commented Jun 19, 2026

Uh oh!

adoroszlai commented Jun 19, 2026

Uh oh!

adoroszlai Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

umesh9794 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

adoroszlai Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

umesh9794 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

umesh9794 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

umesh9794 commented Jun 22, 2026

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants